question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Query for databases not scraped properly?

See original GitHub issue

Hi,

with this yml configuration file:

#
# **LDAP2PG SAMPLE CONFIGURATION**
#
# This is a sample starting point configuration file for ldap2pg.yml. Including
# static roles, groups, privilege and LDAP query.
#
# This configuration assumes the following principles:
#
# - All LDAP users are grouped in `ldap_roles` group.
# - Read privileges are granted to `readers` group.
# - Write privileges are granted to `writers` group.
# - DDL privileges are granted to `owners` group.
# - We have one or more databases with public and maybe a schema.
# - Grants are not specific to a schema. Once you're writer in a database, you
#   are writer to all schemas in it.
#
# Adapt to your needs! See also full documentation on how to configure ldap2pg
# at https://ldap2pg.readthedocs.io/en/latest/config/.
#

verbosity: 5

ldap:
  #HA ldap handling
  uri: "ldap://ldap.example.com"
  binddn: uid=testuser,cn=users,cn=accounts,dc=example,dc=com
  password: password

postgres:
  dsn: postgres://user:password@demodb.xxxxxxxxxxxxxxx.amazonaws.com:5432/demodb
  # Scope the database where to purge objects when dropping roles. This is the
  # scope of grant on `__all__` databases.
  #databases_query: [postgres, appdb, olddb]
  database_query: |
    SELECT datname FROM pg_database
    WHERE datallowconn IS TRUE
    AND datname != 'rdsadmin' AND datname != 'template0' AND datname != 'template1';
  # List of managed schema. This skip pg_toast, pg_temp1, etc. but not pg_catalog.
  schemas_query: |
    SELECT nspname FROM pg_catalog.pg_namespace
    WHERE nspname NOT LIKE 'pg_%' AND nspname NOT LIKE 'information_schema';
  # Return managed roles which can be dropped or revoked.
  managed_roles_query: |
    SELECT 'public'
    UNION
    SELECT DISTINCT role.rolname
    FROM pg_roles AS role
    LEFT OUTER JOIN pg_auth_members AS ms ON ms.member = role.oid
    LEFT OUTER JOIN pg_roles AS ldap_roles
      ON ldap_roles.rolname = 'ldap_roles' AND ldap_roles.oid = ms.roleid
    WHERE role.rolname IN ('ldap_roles', 'readers', 'writers', 'owners')
        OR ldap_roles.oid IS NOT NULL
    ORDER BY 1;

  # Since readers/writer/owners groups are globals, we have a global
  # owners_query.
  owners_query: |
    SELECT DISTINCT role.rolname
    FROM pg_catalog.pg_roles AS role
    JOIN pg_catalog.pg_auth_members AS ms ON ms.member = role.oid
    JOIN pg_catalog.pg_roles AS owners
      ON owners.rolname = 'owners' AND owners.oid = ms.roleid
    ORDER BY 1;


privileges:
  # Define an privilege group `ro` with read-only grants
  ro:
  - __connect__
  - __execute__
  - __select_on_tables__
  - __select_on_sequences__
  - __usage_on_schemas__
  - __usage_on_types__

  # `rw` privilege group lists write-only grants
  rw:
  - __temporary__
  - __all_on_tables__
  - __all_on_sequences__

  # `ddl` privilege group lists DDL only grants.
  ddl:
  - __create_on_schemas__


sync_map:
# First, setup static roles and grants
- roles:
  - names:
    - ldap_roles
    - readers
    options: NOLOGIN
    comment: Custom static comment.
  - name: writers
    # Grant reading to writers
    parent: readers
    options: NOLOGIN
  - name: owners
    # Grant read/write to owners
    parent: writers
    options: NOLOGIN
  # Now grant privileges to each groups
  grant:
  - privilege: ro
    role: readers
    # Let's everyone see pg_catalog
    schema: __all__
  - privilege: rw
    role: writers
    # But avoid writers to write in pg_catalog
    schema: public
  # Allow ddl to create tables in public only
  - privilege: ddl
    role: owners
    schema: public
  # owners must have write access to pg_catalog
  - privilege: rw
    role: owners
    schema: pg_catalog
  # Grants on specific schema appdb.appns:
  - privilege: rw
    role: writers
    database: appdb
    schema: appns
  - privilege: ddl
    role: owners
    database: appdb
    schema: appns

# Now query LDAP to create roles and grant them privileges by parenting.
- ldap:
    base: cn=groups,cn=accounts,dc=example,dc=com
    filter: "(cn=dba)"
  role:
    name: '{member.cn}'
    options: LOGIN SUPERUSER
    parent:
    - ldap_roles
    - owners
    comment: "Custom comment from LDAP: {dn}"
- ldap:
    base: cn=groups,cn=accounts,dc=example,dc=com
    filter: "(cn=app*)"
  role:
    name: '{member.cn}'
    options: LOGIN
    parent:
    - ldap_roles
    - writers
    on_unexpected_dn: warn
- ldap:
    base: cn=groups,cn=accounts,dc=example,dc=com
    filter: |
      (&
        (cn=bi)
        (objectClass=*)
      )
  role:
    name: '{member.cn}'
    options: LOGIN
    parent:
    - ldap_roles
    - readers

I got a connection attempt on “rdsadmin” database which should be exlude by my databases_query.

Error message:

[ldap2pg.script       ERROR] Unhandled error:
[ldap2pg.script       ERROR] Traceback (most recent call last):
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/script.py", line 94, in main
[ldap2pg.script       ERROR]     exit(wrapped_main(config))
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/script.py", line 70, in wrapped_main
[ldap2pg.script       ERROR]     count = manager.sync(syncmap=config['sync_map'])
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/manager.py", line 236, in sync
[ldap2pg.script       ERROR]     schemas = self.inspector.fetch_schemas(databases, ldaproles)
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/inspector.py", line 233, in fetch_schemas
[ldap2pg.script       ERROR]     for dbname, psql in self.psql.itersessions(databases):
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/psql.py", line 73, in itersessions
[ldap2pg.script       ERROR]     with self(dbname) as session:
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/ldap2pg/psql.py", line 142, in __enter__
[ldap2pg.script       ERROR]     self.conn = psycopg2.connect(self.connstring)
[ldap2pg.script       ERROR]   File "/usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py", line 130, in connect
[ldap2pg.script       ERROR]     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
[ldap2pg.script       ERROR] psycopg2.OperationalError: FATAL:  pg_hba.conf rejects connection for host "X.X.X.X", user "username", database "rdsadmin", SSL on
[ldap2pg.script       ERROR] FATAL:  pg_hba.conf rejects connection for host "X.X.X.X", user "username", database "rdsadmin", SSL off
[ldap2pg.script       ERROR] Please file an issue at https://github.com/dalibo/ldap2pg/issues with full log.

Trying to understand your code, adding some logging in python3.5/dist-packages/ldap2pg/inspector.py

 29 class PostgresInspector(object):
 30     def __init__(
 31             self, psql=None, privileges=None, roles_blacklist=None,
 32             shared_queries=None, **queries):
 33         self.psql = psql
 34         self.privileges = privileges or {}
 35         self.shared_queries = shared_queries or {}
 36         self.queries = queries
 37         msg = ("MARKER {q} {c}").format(q=type(self.queries), c=self.queries)     <-- Added line
 38         logging.error(msg)                                                                                       <-- Added line

Error message output:

[root                 ERROR] MARKER <class 'dict'> {'all_roles': 'SELECT\n  role.rolname, array_agg(members.rolname) AS members, {options}\nFROM\n  pg_catalog.pg_roles AS role\nLEFT JOIN pg_catalog.pg_auth_members ON roleid = role.oid\nLEFT JOIN pg_catalog.pg_roles AS members ON members.oid = member\nGROUP BY role.rolname, {options}\nORDER BY 1;\n', 'databases': 'SELECT datname FROM pg_catalog.pg_database\nWHERE datallowconn IS TRUE ORDER BY 1;\n', 'managed_roles': "SELECT 'public'\nUNION\nSELECT DISTINCT role.rolname\nFROM pg_roles AS role\nLEFT OUTER JOIN pg_auth_members AS ms ON ms.member = role.oid\nLEFT OUTER JOIN pg_roles AS ldap_roles\n  ON ldap_roles.rolname = 'ldap_roles' AND ldap_roles.oid = ms.roleid\nWHERE role.rolname IN ('ldap_roles', 'readers', 'writers', 'owners')\n    OR ldap_roles.oid IS NOT NULL\nORDER BY 1;\n", 'owners': "SELECT DISTINCT role.rolname\nFROM pg_catalog.pg_roles AS role\nJOIN pg_catalog.pg_auth_members AS ms ON ms.member = role.oid\nJOIN pg_catalog.pg_roles AS owners\n  ON owners.rolname = 'owners' AND owners.oid = ms.roleid\nORDER BY 1;\n", 'schemas': "SELECT nspname FROM pg_catalog.pg_namespace\nWHERE nspname NOT LIKE 'pg_%' AND nspname NOT LIKE 'information_schema';\n"}

The query seems to be the default one defined in the config.py (lines 348-350): 'databases': 'SELECT datname FROM pg_catalog.pg_database\nWHERE datallowconn IS TRUE ORDER BY 1;

Am I missing something here?

Other parameters in the conf file seem to be scraped correctly.

And by the way, why are you doing your own mapping and not using something like: https://pyyaml.org/wiki/PyYAML ?

Thanks in advance.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Stanislassscommented, Jan 28, 2019

@bersace Nice work. Thanks for the great follow-up. Closing this.

0reactions
Stanislassscommented, Jan 11, 2019

Sure thing, I’ll let the ticket opened until then, and let you know about my setup experience. 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

scraped items not being saved into database - Stack Overflow
my scrapy not saving data into database. please suggest. it is ... item, spider): # run db query in thread pool query =...
Read more >
Step by Step Guide on Scraping Data from a Website and ...
Scraping data online is something every business owner can do to create a copy of a competitor's database and analyze the data to...
Read more >
How I save my Scraped Data to a Database with Python ...
We've focused on how to scrape content but not on how to save it persistently - I'll show you how I save my...
Read more >
Storing Scraped Data - Finxter
The purpose of this article is to educate you on how to store scraped content from websites into a file or database. The...
Read more >
Web Scraping Basics - Towards Data Science
You can see that Google does not allow web scraping for many of its ... Approach 2: If website stores data in API...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found